15 research outputs found

    Finding similar neighborhoods across cities by mining human urban activity

    Get PDF
    We propose a method to match similar neighborhoods across different cities. That is, we give ourselves a measure of similarity between urban regions, as well as one region in one city. Our goal is then to find the region in some other cities which minimize the distance with the query region. Furthermore, we seek to do it efficiently, as it is prohibitive to evaluate the distance of all possible candidate regions. First, we collect trace of activities in 20 European and American cities from location aware social platforms Foursquare and Flickr. A thorough exploration of this dataset leads us to describe individual venues by relevant features including their aggregate activity across time, their visitors and overall popularity, and the typology of their surrounding. Then we learned several measures of venue similarity in a semi-supervised setting and evaluate their performance on two information retrieval tasks. After gathering human ground truth about neighborhoods, we evaluate different metrics between sets of venues and find out that Earth Mover’s Distance is best suited at assessing neighborhood similarity. Finally, we address the computational efficiency problem of finding the most similar neighborhood given a query. We devise a heuristic search strategy and show that it provides results of comparable quality while being orders of magnitude faster. This work has application in touristic recommendation and urban planning, as it provides a similarity measure between urban areas

    On the Troll-Trust Model for Edge Sign Prediction in Social Networks

    Get PDF
    In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges. We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels. We then show that the maximum likelihood estimator for this model approximately corresponds to the predictions of a Label Propagation algorithm run on a transformed version of the original social graph. Extensive experiments on a number of real-world datasets show that this algorithm is competitive against state-of-the-art classifiers in terms of both accuracy and scalability. Finally, we show that troll-trust features can also be used to derive online learning algorithms which have theoretical guarantees even when edges are adversarially labeled.Comment: v5: accepted to AISTATS 201

    "What Is the City but the People?" Exploring Urban Activity Using Social Web Traces

    Get PDF
    International audienceWe demonstrate GeoTopics, a system to explore geographical patterns of urban activity. The system collects publicly shared check-ins generated by Foursquare users, that reveal who spends time where, when, and on what type of activity. It then employs sparse probabilistic modeling techniques to learn associations between different regions of a city and multi-feature descriptions of urban activity. Through a web interface, users of the system can select a city of interest and explore visualizations that highlight how different types of activity are spatially and temporally distributed in the city. We discuss the opportunities that web data offer to understand urban activity and the challenges one faces in that task. We then describe our approach and the architecture of GeoTopics. Finally, we lay out the demonstration scenario

    Where is the Soho of Rome? Measures and algorithms for finding similar neighborhoods in cities

    Get PDF
    International audienceData generated on location-aware social media provide rich information about the places (shopping malls, restaurants, cafés , etc) where citizens spend their time. That information can, in turn, be used to describe city neighborhoods in terms of the activity that takes place therein. For example, the data might reveal that citizens visit one neighborhood mainly for shopping , while another for its dining venues. In this paper, we present a methodology to analyze such data, describe neighborhoods in terms of the activity they host, and discover similar neighborhoods across cities. Using millions of Foursquare check-ins from cities in Eu-rope and the US, we conduct an extensive study on features and measures that can be used to quantify similarity of city neighborhoods. We find that the earth-mover's distance outper-forms other candidate measures in finding similar neighborhoods. Subsequently, using the earth-mover's distance as our measure of choice, we address the issue of computational efficiency: given a neighborhood in one city, how to efficiently retrieve the k most similar neighborhoods in other cities. We propose a similarity-search strategy that yields significant speed improvement over the brute-force search, with minimal loss in accuracy. We conclude with a case study that compares neighborhoods of Paris to neighborhoods of other cities

    Caractérisation des arêtes dans les graphes signés et attribués

    No full text
    In this thesis, we develop methods to efficiently and accurately characterize edges in complex networks. In simple graphs, nodes are connected by a single semantic. For instance, two users are friends in a social networks, or there is a hypertext link from one webpage to another. Furthermore, those connections are typically driven by node similarity, in what is known as the homophily mechanism. In the previous examples, users become friends because of common features, and webpages link to each other based on common topics. By contrast, complex networks are graphs where every connection has one semantic among k possible ones. Those connections are moreover based on both partial homophily and heterophily of their endpoints. This additional information enable finer analysis of real world graphs. However, it can be expensive to acquire, or is sometimes not known beforehand. We address the problems of inferring edge semantics in various settings. First, we consider graphs where edges have two opposite semantics, and where we observe the label of some edges. These so-called signed graphs are a convenient way to represent polarized interactions. We propose two learning biases suited for directed and undirected signed graphs respectively. This leads us to design several algorithms leveraging the graph topology to solve a binary classification problem that we call Edge Sign Prediction. Second, we consider graphs with k ≥ 2 available semantics for edge. In that case of multilayer graphs, we are not provided with any edge label, but instead are given one feature vector for each node. Faced with such an unsupervised Edge Attributed Clustering problem, we devise a quality criterion expressing how well an edge k-partition and k semantical vectors explains the observed connections. We optimize this goodness of explanation criterion in vectorial and matricial forms, and show how those two methods perform on synthetic data.Dans cette thèse, nous proposons des méthodes pour caractériser efficacement et précisément les arêtes au sein de réseaux complexes. Dans les graphes simples, les nœuds sont liés au travers d’une sémantique unique. Par exemple, deux utilisateurs sont amis dans un réseau social, ou une page web contient un lien hypertexte pointant vers un autre page. De plus, ces connexions sont généralement guidées par la similarité entre les nœuds, au travers d’un mécanisme appelé homophilie. Dans les exemples précédents, les utilisateurs deviennent amis à cause de caractéristiques communes, et les pages web sont reliées les unes aux autres sur la base de sujets communs. En revanche, les réseaux complexes sont des graphes où chaque connexion possède une sémantique parmi k possibles. Ces connexions sont en outre basées à la fois sur une homophilie et une hétérophilie partielle des nœuds à leurs extrémité. Cette information supplémentaire permet une analyse plus fine des graphes issus d’applications réelles. Cependant, elle peut être coûteuse à acquérir, ou n’est pas toujours disponible a priori. Nous abordons donc le problème d’inférer la sémantique des arêtes dans plusieurs contextes. Tout d’abord, nous considérons les graphes où les arêtes ont deux sémantiques opposées, et où nous observons l’étiquette de certaines arêtes. Ces « graphes signés » sont une façon élégante de représenter des interactions polarisées. Nous proposons deux biais d’apprentissage, adaptés respectivement aux graphes signés dirigés et non dirigés. Ceci nous amène à concevoir plusieurs algorithmes utilisant la topologie du graphe pour résoudre un problème de classification binaire que nous appelons Edge Sign Prediction. Deuxièmement, nous considérons les graphes avec k ≥ 2 sémantiques possibles pour les arêtes. Dans ce cas, nous ne recevons pas d’étiquette d’arêtes, mais plutôt un vecteur de caractéristiques pour chaque nœud. Face à ce problème non supervisé d’Edge Attributed Clustering, nous concevons un critère de qualité exprimant dans quelle mesure une k-partition des arêtes et k vecteurs sémantiques expliquent les connexions observées. Nous optimisons ce critère « qualité explicative » sous une forme vectorielle et matricielle et illustrons le comportement de ces deux méthodes sur des données synthétiques

    Caractérisation des arêtes dans les graphes signés et attribués

    Get PDF
    In this thesis, we develop methods to efficiently and accurately characterize edges in complex networks. In simple graphs, nodes are connected by a single semantic. For instance, two users are friends in a social networks, or there is a hypertext link from one webpage to another. Furthermore, those connections are typically driven by node similarity, in what is known as the homophily mechanism. In the previous examples, users become friends because of common features, and webpages link to each other based on common topics. By contrast, complex networks are graphs where every connection has one semantic among k possible ones. Those connections are moreover based on both partial homophily and heterophily of their endpoints. This additional information enable finer analysis of real world graphs. However, it can be expensive to acquire, or is sometimes not known beforehand. We address the problems of inferring edge semantics in various settings. First, we consider graphs where edges have two opposite semantics, and where we observe the label of some edges. These so-called signed graphs are a convenient way to represent polarized interactions. We propose two learning biases suited for directed and undirected signed graphs respectively. This leads us to design several algorithms leveraging the graph topology to solve a binary classification problem that we call Edge Sign Prediction. Second, we consider graphs with k ≥ 2 available semantics for edge. In that case of multilayer graphs, we are not provided with any edge label, but instead are given one feature vector for each node. Faced with such an unsupervised Edge Attributed Clustering problem, we devise a quality criterion expressing how well an edge k-partition and k semantical vectors explains the observed connections. We optimize this goodness of explanation criterion in vectorial and matricial forms, and show how those two methods perform on synthetic data.Dans cette thèse, nous proposons des méthodes pour caractériser efficacement et précisément les arêtes au sein de réseaux complexes. Dans les graphes simples, les nœuds sont liés au travers d’une sémantique unique. Par exemple, deux utilisateurs sont amis dans un réseau social, ou une page web contient un lien hypertexte pointant vers un autre page. De plus, ces connexions sont généralement guidées par la similarité entre les nœuds, au travers d’un mécanisme appelé homophilie. Dans les exemples précédents, les utilisateurs deviennent amis à cause de caractéristiques communes, et les pages web sont reliées les unes aux autres sur la base de sujets communs. En revanche, les réseaux complexes sont des graphes où chaque connexion possède une sémantique parmi k possibles. Ces connexions sont en outre basées à la fois sur une homophilie et une hétérophilie partielle des nœuds à leurs extrémité. Cette information supplémentaire permet une analyse plus fine des graphes issus d’applications réelles. Cependant, elle peut être coûteuse à acquérir, ou n’est pas toujours disponible a priori. Nous abordons donc le problème d’inférer la sémantique des arêtes dans plusieurs contextes. Tout d’abord, nous considérons les graphes où les arêtes ont deux sémantiques opposées, et où nous observons l’étiquette de certaines arêtes. Ces « graphes signés » sont une façon élégante de représenter des interactions polarisées. Nous proposons deux biais d’apprentissage, adaptés respectivement aux graphes signés dirigés et non dirigés. Ceci nous amène à concevoir plusieurs algorithmes utilisant la topologie du graphe pour résoudre un problème de classification binaire que nous appelons Edge Sign Prediction. Deuxièmement, nous considérons les graphes avec k ≥ 2 sémantiques possibles pour les arêtes. Dans ce cas, nous ne recevons pas d’étiquette d’arêtes, mais plutôt un vecteur de caractéristiques pour chaque nœud. Face à ce problème non supervisé d’Edge Attributed Clustering, nous concevons un critère de qualité exprimant dans quelle mesure une k-partition des arêtes et k vecteurs sémantiques expliquent les connexions observées. Nous optimisons ce critère « qualité explicative » sous une forme vectorielle et matricielle et illustrons le comportement de ces deux méthodes sur des données synthétiques

    Density of Paris Foursquare venues

    No full text
    <p>Result of Scikit Gaussian Kernel Density estimation on a set of 4008 venues in Paris.</p

    Where is the Soho of Rome? Measures and algorithms for finding similar neighborhoods in cities

    No full text
    International audienceData generated on location-aware social media provide rich information about the places (shopping malls, restaurants, cafés , etc) where citizens spend their time. That information can, in turn, be used to describe city neighborhoods in terms of the activity that takes place therein. For example, the data might reveal that citizens visit one neighborhood mainly for shopping , while another for its dining venues. In this paper, we present a methodology to analyze such data, describe neighborhoods in terms of the activity they host, and discover similar neighborhoods across cities. Using millions of Foursquare check-ins from cities in Eu-rope and the US, we conduct an extensive study on features and measures that can be used to quantify similarity of city neighborhoods. We find that the earth-mover's distance outper-forms other candidate measures in finding similar neighborhoods. Subsequently, using the earth-mover's distance as our measure of choice, we address the issue of computational efficiency: given a neighborhood in one city, how to efficiently retrieve the k most similar neighborhoods in other cities. We propose a similarity-search strategy that yields significant speed improvement over the brute-force search, with minimal loss in accuracy. We conclude with a case study that compares neighborhoods of Paris to neighborhoods of other cities
    corecore